Approximate policy iteration using regularised Bellman residuals minimisation
نویسندگان
چکیده
منابع مشابه
Approximate policy iteration using regularised Bellman residuals minimisation
In this paper we present an Approximate Policy Iteration (API) method called API−BRM using a very effective implementation of incremental Support Vector Regression (SVR) to approximate the value function able to generalize in continuous (or large) space Reinforcement Learning (RL) problems. RL ia a methodology able to solve complex and uncertain decision problem usually modeled as Markov Decisi...
متن کاملApproximate Policy Iteration using Large-Margin Classifiers
We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding d...
متن کاملApproximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are exten...
متن کاملError Bounds for Approximate Policy Iteration
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration results -in discounted problemsfrom a contraction property of the back-up operator, guaranteeing convergence to its fixedpoint. When approximation is considered, known results in Approximate Policy Iteration provide bounds on the closeness to optimality of the approximate value function obtained by suc...
متن کاملApproximate Policy Iteration with Demonstration Data
We propose an algorithm to solve uncertain sequential decision-making problems that utilizes two different types of data sources. The first is the data available in the conventional reinforcement learning setup: an agent interacts with the environment and receives a sequence of state transition samples alongside the corresponding reward signal. The second data source, which differentiates the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Experimental & Theoretical Artificial Intelligence
سال: 2015
ISSN: 0952-813X,1362-3079
DOI: 10.1080/0952813x.2015.1024494